Mostly Exploration-Free Algorithms for Contextual Bandits

نویسندگان

Hamsa Bastani

Mohsen Bayati

Khashayar Khosravi

چکیده

The contextual bandit literature has traditionally focused on algorithms that address the explorationexploitation tradeoff. In particular, greedy algorithms that exploit current estimates without any exploration may be sub-optimal in general. However, exploration-free greedy algorithms are desirable in practical settings where exploration may be costly or unethical (e.g., clinical trials). Surprisingly, we find that a simple greedy algorithm can be rate-optimal if there is sufficient randomness in the observed contexts. We prove that this is always the case for a two-armed bandit under a general class of context distributions that satisfy a condition we term covariate diversity. Furthermore, even absent this condition, we show that a greedy algorithm can be rate-optimal with nonzero probability. Thus, standard bandit algorithms may unnecessarily explore. Motivated by these results, we introduce Greedy-First, a new algorithm that uses only observed contexts and rewards to determine whether to follow a greedy algorithm or to explore. We prove that this algorithm is rate-optimal without any additional assumptions on the context distribution or the number of arms. Extensive simulations demonstrate that Greedy-First successfully reduces experimentation and outperforms existing (exploration-based) contextual bandit algorithms such as Thompson sampling or UCB.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Exponentiated Gradient LINUCB for Contextual Multi-Armed Bandits

We present Exponentiated Gradient LINUCB, an algorithm for contextual multi-armed bandits. This algorithm uses Exponentiated Gradient to find the optimal exploration of the LINUCB. Within a deliberately designed offline simulation framework we conduct evaluations with real online event log data. The experimental results demonstrate that our algorithm outperforms surveyed algorithms.

متن کامل

Exploiting the Natural Exploration In Contextual Bandits

متن کامل

Avoiding the Exploration-Exploitation Tradeoff in Contextual Bandits

متن کامل

Estimation Considerations in Contextual Bandits

Although many contextual bandit algorithms have similar theoretical guarantees, the characteristics of real-world applications oftentimes result in large performance dissimilarities across algorithms. We study a consideration for the exploration vs. exploitation framework that does not arise in non-contextual bandits: the way exploration is conducted in the present may affect the bias and varia...

متن کامل

Exploration-Free Policies in Dynamic Pricing and Online Decision-Making

Growing availability of data has enabled practitioners to tailor decisions at the individuallevel. This involves learning a model of decision outcomes conditional on individual-specific covariates or features. Recently, contextual bandits have been introduced as a framework to study these online and sequential decision making problems. This literature predominantly focuses on algorithms that ba...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2018

Mostly Exploration-Free Algorithms for Contextual Bandits

نویسندگان

چکیده

منابع مشابه

Exponentiated Gradient LINUCB for Contextual Multi-Armed Bandits

Exploiting the Natural Exploration In Contextual Bandits

Avoiding the Exploration-Exploitation Tradeoff in Contextual Bandits

Estimation Considerations in Contextual Bandits

Exploration-Free Policies in Dynamic Pricing and Online Decision-Making

عنوان ژورنال:

اشتراک گذاری